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ABSTRACT 



Texts often suggest running preliminary tests for 
homogeneity of variance prior to running an ANOVA.. While it has been 
known for some time that mt>st of the suggested tests are probably not 
appropriate, they are still being used. This paper is a review of the 
literature in terms of the implications involved in running 
preliminary tests in general anc^ various ones in particular: Cochran, 
Hartley, Box and Andersen, Bartlett, Levene. It re-emphasizes the 
need to attain equal cell sizes and suggests the appropriateness of 
the welch test when that is not possible. The paper looks at the 
difference in assumptions which must be met in the fixed and random 
effects models, in a one-way design. (Author) 




96Z190 03 



REVIEW OF PROBLEMS OF TESTING 
FOR HOMOGENEITY 
PRIOR TO RUNNING AN ANOVA 



Elizabeth C. Proper 



University of Massachusett 



o 

o 



U.S. DEPARTMENT DF HEALTH, 
EDUCATION & V^ELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY 



Pf 4 



E-< 



(Paper presented at NEERO Annual Conference, Chestnut Hill, Mass., June 1971.) 



ERIC 



1 



An assumption underlying the analysis of variance is homogeneity of 
variance. This paper is concerned with the random and fixed effects models 
in the one way design. It discusses briefly the scope and limitations of 
various tests which have been suggested as preliminary tests as well as 
other tests which are available but are not often suggested. An examina- 
tion is then made of the appropriateness of the use of such a preliminary 
test. 

While most statistics books deal with the problem of homogeneity, 
they treat it at varying levels of significance and thus provide responses 
to it and suggestions for dealing with it at differing levels. This 
paper will point out problems involved and will identify various in- 
adequacies Ln solutions which have been proposed for handling the assump- 
tion. rKe problem is important because homogeneity is a potential 
issue every time that one uses analysis of variance. The tests to be 
discussed are the Hartley, Cochran, Bartlett, Wald sequential analysis, 
Bartlett and Kendall, Box and Andersen, and Levene. 

The test proposed by Hartley, the test (Winer, 1962), 

_ largest of k treatment variances 
^max " smallest of k treatment variances 

perhaps the easiest test to compute, has as its parameters: n-1 

degress of freedom and the k treatments; a special table exists for 

interpretation. The n in the degrees of freedom is the sample size 

within each treatment group; therefore, the test requires equal n s. 

In a case of slight inequality, the largest of the n's may be used; this 

will result in the null being rejected more often that it should be; in 

other words, such usage will result in a slight positive bias. 

Winer (1962) points out that this test makes use of what is 
equivalent to the range of the sample variances. Making use of only 
Q wo pieces of data, it is not a sufficient statistic. The problem of 
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sensitivity to non-^norinality of the F test will be iiandled in 

max 

conjunction with the other tests.. 

The Cochran test (Winer. 1962), 

Q ^ largest of k treatment variances 
S treatinent variances 

as the F , uses as its parc.meters, the k treatment groups and n-1, the 
max I- y o I- 

degress of freedcm for each ( f the treatment variances, thus depending 
upon equal groups, and resulting in a slight positive bias when the 
largest n of slightly unequal groups is used. There is a special table 
for interpretation. Because th^s test uses more information, is more 
sufficient, it is more sensitive than the F test: the practical import 
of this fnct, however, is no. ligfble in teritis of the in this 

paper as will be noted in tht, general discussion of sensitivity to non- 
normality and appropriatenet : of prelim- .-ary testing. however, con- 
ceptually, it appears that chis statistic is still insufficient in that 
the variability of variances other than the largest, which is used as 
the numerator in the equation, is taken into consideration in only a 
secondary way, through examination of their totality; it is conceivable 
that a more sufficient statistic would take into account ^on 

and/or di.stribution of the k-1 variances. For example, two different 
five ee! 1 designs might have the same largest within cell variance and 
the same total variance, but the other k-1 variances in the two designs 
might be quite different. 10,10,1,1,1 and 10,4,2,4,3 would each have 
a largest within cell variance of 10 and a sum of v^ithin celJ variance 
of 23, but they do not share similar proportionality or distribution 
of variance. 
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The Bartlett test (Winer, 1962) is 



s. 2.303 , 

= (f log MS 



error 



Zfj log s-M 



where 



f = n. - I = df for . 
.1 J 3 

f = Xf. = dr for MS 
3 e 



c = 1 + 



MS 

error 



1 

3(k-l) 



zss. 

1 

Zf . 

3 




k = no. of treatment groups. 

The x^distribution with k-1 legrees of freedom to determine significance 
is i'onsi derod by Winer (1962) to be "sufficiently sensitive" for use in 
those situations which require a preliminary test. However, he dis- 
courages such usage except in "a relatively few cases" [p. 95]- He 

considers the F and the Cochran to be adequate for most needs. The 
ma X 

sampling <1 i s t r 1 bu i:i ' Ml t.bo ratio’ arithmeti ‘ mean/ cci^me t r ' ' * v;ln h 

the Bartlett test uses hc;. a .utallei ^^.andard error than the sampling 

distribution of the ranee of the sample variances which the F test 

uses; thus the* grea^ter power for the Bartlett test (Winer, 1962). The 

fact that the test allows for unequal n’s makes it more useful than thii 

F and the ^^'hram; the laborious calculations involved discourage ins 
max 

use . 



The se< .Cii 
an . 1 I ( ( ' I n 1 1 i v< 

I |)p I ii.it i on <) I 
that 



rial anaLy.sls method of statistical inference provides 
leLiiod of <v;aniining the data. Wald (19/4?) discusses 
his iiietIuKi lor testing the that when 11^ Is 

The test assumes known population means, but there is 
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a. 3 iodif icat ion of the procedure for unknown population means. An 
initial objection to common usage of this test is the necessity of 
working in a new and different frame of reference than that encountered 
by most elementary practitio lers of applied educational statistics, with 
the result that probably it would be ignored. A second objection is 
that it appears to be fairly complicated to compute. A third is that 
Wald discusses his system of sequential analysis in terms of a normal 
distribution which, as will l:>e pointed out, makes it inappropriate as 
a preliminary test. 

Bartlett and Kendall (1946) developed a test for homogeneity of 
variance which utilizes the logarithms of variance estimates. 

According to Box (1953) this test also depends on a normal distribution. 

Box and Andersen (Box, 1953; Box and Andersen, 1955) report that 
in tests such as the Bartlett, Cochran, sequential, and the 

Bartlett-Kendall, one compares the variation of samples with a theo-- 
retica^ variation rather than with another internal measure, such as is 
done when betx^een and within group means are compared- This results in 
these tests being heavily dependent upon a normal distribution- 
Examples in Table 1, taken from Box (1953, p. 320), illustrate the 
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effect of non-normality, specifically in terms of the Bartlett test. 
VJithin a leptokurtic population (one with a peaked distribution) , 
differences will tend to he manifest where none exist; as indicated in 
Table 1, a kurtosis of 1 will result in the 5% level shifting to the 
11% level for a two group design and to the 17.6% level for a five 

O 
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group design. Within a platykurtic population (one with a flattened 
distribution), differences which are real will tend not to be iiiade 
manifest; a kurtosis of ■•I, ^is shown in Table 1, will depress the 
5% level to a .56% level for two groups and to a .03% level for five 
groups. Thus, significant r esults obtained with these tests ma^/ just 
as easily indicate a non^nonal distribution as lack of homogeneity. 

Box suggests a need to studeitize the fourth moment as the second 
moment has been student ized for the test on means. 

Box and Andersen (1955) developed a test which is a modification 
of the F and Bartlett tests, based on permutation theory, which pro- 
vides an approximate size alpha even in cases of non— normality , Theii 
test is based on the fourth moment; it determines a correction factor 
for the degrees of freedom. Their data indicate that their method 
is adequate for normal, rectangular and double exponential distributions. 
However, equal cell sizes were used. 

A test developed by Levene (1960) which he proposed as an alter- 
native to the Box and Andersen test, in part because it may ha'^^e greater 
applicability, uses the standard analysis of variance techniques on ^ 
the absolute differences between x. . and x. . In his analysis, he ex- 

IJ X. 

plored the use of z, z^ , the log of z, and Z and its square behave 

best in his analysis, his preference being z because of ease of com- 
putation. Examining the test under normal, uniform, double exponen- 
tial and a bizarre C distribution (a misrun of the double exponential), 
he found that his test has power comparable to the Box and Andersen, 
although the Box and Andersen alpha levels are slightly better. As with 
the Box and Andersen, his tests involved equal sample sizes. 
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Of the tests examined thus far, it appears that only the Box and 
Andersen, and the Levene are not sensitive to non-normality and that 
the studies of them have been for equal n's. Scheff^ (1959) suggests 
.1 test to determine if there is inequality of variance, which as he 
points out is appropriate not for preliminary testing, with which this 
paper is concerned, but with those cases in which the primary concern 
is differences between variances. 

The pupose of this paper is to examine tests which might be used 
to test homogeneity of variance as it is an assumption underlying the 
analysis of variance. Various tests have been examined which have been 
suggested for use as preliminary tests; now an examination will be made 
of the circumstances in which they might be needed. 

The concern for homogeneity involves Model I, but not Model II, 
that is the fixed effects model, but not the random effects model. This 
is so because in the random effects model one assumes that there is only 
one distribution of errors with a given variance. The errors must, 
however, be independent of each other and the treatments (Hays, 1963). 

The random effects model is sensitive to departures from normality 
(Kendall, 1966), 

Within Model I (fixed effects) it has been noted by Box (1954) that 
moderate inequality of variance does not have serious effects providing 
that the cell sizes are equal. For example, with three groups having 
n's of five if the ratio of the groups variances is 1:2:3, the proba- 
bility of exceeding the 5% point is 5,58; if the ratio of the group 
variances is 1:1:3, the probability of exceeding the 5% point is 5,87, 

One, therefore, does not need to test for homogeneity in the fixed effects 
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model if one has equal cell sizes. The primary area of concern that 
remains is the case of unequal cell sizes in the fixed effects model. 
The tests which have been reviewed here either demand equal cell sizes 
or are subject to being sensitive to departures from normality. 

Prior to suggesting a solution to the problem of what to do 
if one suspects heterogeneity of variance with unequal cell sizes when 
one wishes to do an analysis of variance, it might be well to stop 
and examine whether or not one should ever run a preliminary test, 
even if one exists that meets the requirements. The results of a 
preliminary test will depend on the power of the preliminary test. 

Box and Andersen (1955) suggest that the concern should be the ro- 
bustness of the main test. In a sense, one is removing oneself by 
another step from the problem when one runs a preliminary test- In 
the practical world of today, where most tests are sensitive to de- 
partures from normality, this means that one could reject the null 
hypothesis of homogeneity of variance and therefore not run an analysis 
of variance or play with the data prior to running it, when actually 
homogeneity existed, but there was a departure from normality. 

Box and Andersen (Box, 1953; Box and Andersen, 1955) suggest 
that the answer to the problem of possible lack of homogeneity of 
variance with unequal cell sizes in the one way design is the Welch 
criteirion (Welch, 1951) which uses a weighted variance in place of the 
pooled variance: ^ w^(x^-x)^; = n^/s^^. According to Box and 

Andersen (1955), this modified criterion **would be expected to be in- 
sensitive to differences in groups variances (and by analogy with the 
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s-tandard test) to departures from, normality also** [p. 3]* 
The Welch criterion is 









I — 

1 + 



2(k - 2) 
(k^ - 1) 



LT (1 - 




where 

t = treatment 



= number of individuals within treatment t 



individual within treatment variances, estimated on df f (one less 



than number of replicates in each case) 
k = number of treatments 



w = n /s ^ 
t t t 



y = (>:w^ y^) / (Iw^) 



y = treatment mean 
t 



(k - 1) 
3 



(k^ - 1) t 



It 



1 - 



w 

t 



Tm 



-1 



refer to variance ratio table with df f^ and f 2 

Scheff^ points out that computations are difficult in a weighted analysis 
beyond the one-way level because of the loss of orthogonality involved. 

In a telephone conversation. Gene Glass concurred with the author that at 
this point there is no way of testing the assumption of homogeneity wheii 
one has more than a one way design. 

A question which remains is whether or not a person running 
experiments in education should be concerned with a possible lack of 
liomogeri i ty of variance aside from possible effects it may have on the 
ANOVA. It would seem that if members had been randomly assigned. to 
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treatment groups that a lack of homogeneity of variance in a post test 
situation might be of importance in and of itself. Our elementary text 
books teach about means and cibout testing for the difference between 
means; perhaps it is a mistake to make the logical Inference that because 
text books teach this, that this is all one may find in experimental 
results. A lack of homogene:.ty of variance may not occur very often, 
and considering its implications in terms of simple analyses, we may 
be grateful that it does not; however, possibly there should be greater 
emphasis on the fact that when it does occur, that it might be a 
treatment effect on within cell variances. 

Table II is a review of those areas in which the assumption of 
homogcMioity \ important and notes ways of handling the problem. The 
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assumption of homogeneity of within cell variance is a concern when one 
has unequal n’s in a fixed factor one way design. In this case, one 
should run the Welch criterion if heterogeneity is suspected; Box (1953) 
suggests using the Welch criterion whenever heterogeneity is suspected, 
including the case of equal n’s. 
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TABLE 1 



True Percentage Chance of Exceeding 5% Level 
in Large Samples from Non-normal Populations 





No, of groups 




3 1 

L ^ 


3 


1 


[ j 1 

i 11,0 ! 13.6 ! 

1 : 


1 17,6 

i 



0 i 5.0 : 5.0 ; 5.0 

-1 ; 0.56 I 0.25 i 0.08 



> kurtosis, is a measure of normality 
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TABLE 2 

Suggestions for Handling Assumption of Homogeneity 
of Within Cell Variance in a One Way Design 



Model 


Cell type 


Suggestions 


f ixed 


equal n*s 


violation not serious 




unequal n*s 


use Welch criterion 


random 


equal n^s 


homogeneity of within cell 
variance is not a concern 




unequal n*s 


homogeneity of within cell 
variance is not a concern 
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